Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells434442
Missing cells (%)8.1%8.3%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedAlert not present in this datasetHigh correlation
Survived is highly overall correlated with SexAlert not present in this datasetHigh correlation
Age has 87 (19.5%) missing values Age has 87 (19.5%) missing values Missing
Cabin has 345 (77.4%) missing values Cabin has 354 (79.4%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 304 (68.2%) zeros SibSp has 296 (66.4%) zeros Zeros
Parch has 337 (75.6%) zeros Parch has 341 (76.5%) zeros Zeros
Fare has 6 (1.3%) zeros Fare has 9 (2.0%) zeros Zeros
Alert not present in this datasetFare is highly overall correlated with SibSpHigh correlation
Alert not present in this datasetSibSp is highly overall correlated with FareHigh correlation

Reproduction

 Dataset ADataset B
Analysis started2024-10-16 08:47:10.4985632024-10-16 08:47:13.606847
Analysis finished2024-10-16 08:47:13.6035532024-10-16 08:47:16.740179
Duration3.1 seconds3.13 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean466.20852445.96637
 Dataset ADataset B
Minimum12
Maximum891885
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-16T08:47:16.872974image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum12
5-th percentile44.2551.25
Q1247.75229.5
median478447
Q3694.5656.75
95-th percentile849.25827.5
Maximum891885
Range890883
Interquartile range (IQR)446.75427.25

Descriptive statistics

 Dataset ADataset B
Standard deviation257.54667251.05841
Coefficient of variation (CV)0.552428060.56295368
Kurtosis-1.18671-1.1765655
Mean466.20852445.96637
Median Absolute Deviation (MAD)222216
Skewness-0.11634189-0.028202166
Sum207929198901
Variance66330.28763030.325
MonotonicityNot monotonicNot monotonic
2024-10-16T08:47:17.181832image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
767 1
 
0.2%
517 1
 
0.2%
692 1
 
0.2%
536 1
 
0.2%
495 1
 
0.2%
213 1
 
0.2%
399 1
 
0.2%
818 1
 
0.2%
886 1
 
0.2%
108 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
738 1
 
0.2%
157 1
 
0.2%
222 1
 
0.2%
773 1
 
0.2%
875 1
 
0.2%
386 1
 
0.2%
696 1
 
0.2%
571 1
 
0.2%
788 1
 
0.2%
594 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
16 1
0.2%
18 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
16 1
0.2%
18 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
279 
1
167 
0
282 
1
164 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row10
3rd row10
4th row01
5th row00

Common Values

ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Length

2024-10-16T08:47:17.330794image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-10-16T08:47:17.440474image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:17.539865image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring characters

ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 279
62.6%
1 167
37.4%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
251 
1
109 
2
86 
3
256 
1
104 
2
86 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row23
2nd row32
3rd row22
4th row32
5th row32

Common Values

ValueCountFrequency (%)
3 251
56.3%
1 109
24.4%
2 86
 
19.3%
ValueCountFrequency (%)
3 256
57.4%
1 104
23.3%
2 86
 
19.3%

Length

2024-10-16T08:47:17.647916image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-10-16T08:47:17.758601image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:17.868345image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
3 251
56.3%
1 109
24.4%
2 86
 
19.3%
ValueCountFrequency (%)
3 256
57.4%
1 104
23.3%
2 86
 
19.3%

Most occurring characters

ValueCountFrequency (%)
3 251
56.3%
1 109
24.4%
2 86
 
19.3%
ValueCountFrequency (%)
3 256
57.4%
1 104
23.3%
2 86
 
19.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 109
24.4%
2 86
 
19.3%
ValueCountFrequency (%)
3 256
57.4%
1 104
23.3%
2 86
 
19.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 109
24.4%
2 86
 
19.3%
ValueCountFrequency (%)
3 256
57.4%
1 104
23.3%
2 86
 
19.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 251
56.3%
1 109
24.4%
2 86
 
19.3%
ValueCountFrequency (%)
3 256
57.4%
1 104
23.3%
2 86
 
19.3%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-16T08:47:18.293683image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8257
Median length4948
Mean length26.97533627.152466
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1203112110
Distinct characters5958
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowLemore, Mrs. (Amelia Milley)Gilnagh, Miss. Katherine "Katie"
2nd rowKarun, Miss. MancaBracken, Mr. James H
3rd rowHart, Miss. Eva MiriamMack, Mrs. (Mary)
4th rowStanley, Mr. Edward RolandAbelson, Mrs. Samuel (Hannah Wizosky)
5th rowPerkin, Mr. John HenryDavies, Mr. Charles Henry
ValueCountFrequency (%)
mr 261
 
14.3%
miss 94
 
5.2%
mrs 60
 
3.3%
william 36
 
2.0%
john 23
 
1.3%
master 20
 
1.1%
henry 18
 
1.0%
james 14
 
0.8%
george 13
 
0.7%
mary 13
 
0.7%
Other values (897) 1267
69.7%
ValueCountFrequency (%)
mr 268
 
14.7%
miss 83
 
4.6%
mrs 68
 
3.7%
william 30
 
1.6%
master 23
 
1.3%
henry 22
 
1.2%
john 20
 
1.1%
charles 13
 
0.7%
james 13
 
0.7%
mary 12
 
0.7%
Other values (892) 1270
69.7%
2024-10-16T08:47:18.969564image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1375
 
11.4%
r 963
 
8.0%
e 839
 
7.0%
a 796
 
6.6%
n 671
 
5.6%
i 663
 
5.5%
s 651
 
5.4%
M 553
 
4.6%
l 536
 
4.5%
o 518
 
4.3%
Other values (49) 4466
37.1%
ValueCountFrequency (%)
1378
 
11.4%
r 982
 
8.1%
a 856
 
7.1%
e 849
 
7.0%
n 670
 
5.5%
i 645
 
5.3%
s 642
 
5.3%
M 568
 
4.7%
l 535
 
4.4%
o 488
 
4.0%
Other values (48) 4497
37.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12031
100.0%
ValueCountFrequency (%)
(unknown) 12110
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1375
 
11.4%
r 963
 
8.0%
e 839
 
7.0%
a 796
 
6.6%
n 671
 
5.6%
i 663
 
5.5%
s 651
 
5.4%
M 553
 
4.6%
l 536
 
4.5%
o 518
 
4.3%
Other values (49) 4466
37.1%
ValueCountFrequency (%)
1378
 
11.4%
r 982
 
8.1%
a 856
 
7.1%
e 849
 
7.0%
n 670
 
5.5%
i 645
 
5.3%
s 642
 
5.3%
M 568
 
4.7%
l 535
 
4.4%
o 488
 
4.0%
Other values (48) 4497
37.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12031
100.0%
ValueCountFrequency (%)
(unknown) 12110
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1375
 
11.4%
r 963
 
8.0%
e 839
 
7.0%
a 796
 
6.6%
n 671
 
5.6%
i 663
 
5.5%
s 651
 
5.4%
M 553
 
4.6%
l 536
 
4.5%
o 518
 
4.3%
Other values (49) 4466
37.1%
ValueCountFrequency (%)
1378
 
11.4%
r 982
 
8.1%
a 856
 
7.1%
e 849
 
7.0%
n 670
 
5.5%
i 645
 
5.3%
s 642
 
5.3%
M 568
 
4.7%
l 535
 
4.4%
o 488
 
4.0%
Other values (48) 4497
37.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12031
100.0%
ValueCountFrequency (%)
(unknown) 12110
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1375
 
11.4%
r 963
 
8.0%
e 839
 
7.0%
a 796
 
6.6%
n 671
 
5.6%
i 663
 
5.5%
s 651
 
5.4%
M 553
 
4.6%
l 536
 
4.5%
o 518
 
4.3%
Other values (49) 4466
37.1%
ValueCountFrequency (%)
1378
 
11.4%
r 982
 
8.1%
a 856
 
7.1%
e 849
 
7.0%
n 670
 
5.5%
i 645
 
5.3%
s 642
 
5.3%
M 568
 
4.7%
l 535
 
4.4%
o 488
 
4.0%
Other values (48) 4497
37.1%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
292 
female
154 
male
294 
female
152 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.6905834.6816143
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20922088
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalefemale
2nd rowfemalemale
3rd rowfemalefemale
4th rowmalefemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 292
65.5%
female 154
34.5%
ValueCountFrequency (%)
male 294
65.9%
female 152
34.1%

Length

2024-10-16T08:47:19.132641image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-10-16T08:47:19.252691image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:19.353513image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
male 292
65.5%
female 154
34.5%
ValueCountFrequency (%)
male 294
65.9%
female 152
34.1%

Most occurring characters

ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2092
100.0%
ValueCountFrequency (%)
(unknown) 2088
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2092
100.0%
ValueCountFrequency (%)
(unknown) 2088
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2092
100.0%
ValueCountFrequency (%)
(unknown) 2088
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7281
Distinct (%)20.1%22.6%
Missing8787
Missing (%)19.5%19.5%
Infinite00
Infinite (%)0.0%0.0%
Mean29.64693629.517187
 Dataset ADataset B
Minimum0.420.67
Maximum7480
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-16T08:47:19.507937image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.67
5-th percentile34
Q12020
median2828
Q33938
95-th percentile56.154.1
Maximum7480
Range73.5879.33
Interquartile range (IQR)1918

Descriptive statistics

 Dataset ADataset B
Standard deviation15.16418214.679815
Coefficient of variation (CV)0.511492380.49733109
Kurtosis-0.0905548450.10730441
Mean29.64693629.517187
Median Absolute Deviation (MAD)99
Skewness0.318337310.33728083
Sum10643.2510596.67
Variance229.95241215.49696
MonotonicityNot monotonicNot monotonic
2024-10-16T08:47:19.714374image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25 15
 
3.4%
21 13
 
2.9%
19 13
 
2.9%
26 13
 
2.9%
24 13
 
2.9%
29 11
 
2.5%
18 11
 
2.5%
22 10
 
2.2%
27 10
 
2.2%
30 10
 
2.2%
Other values (62) 240
53.8%
(Missing) 87
 
19.5%
ValueCountFrequency (%)
28 17
 
3.8%
22 14
 
3.1%
30 13
 
2.9%
25 13
 
2.9%
18 13
 
2.9%
24 13
 
2.9%
36 12
 
2.7%
19 11
 
2.5%
21 11
 
2.5%
26 11
 
2.5%
Other values (71) 231
51.8%
(Missing) 87
 
19.5%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 5
1.1%
2 6
1.3%
3 5
1.1%
4 5
1.1%
5 3
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 4
0.9%
2 5
1.1%
3 3
0.7%
4 6
1.3%
5 3
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 4
0.9%
2 5
1.1%
3 3
0.7%
4 6
1.3%
5 3
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 2
 
0.4%
0.92 1
 
0.2%
1 5
1.1%
2 6
1.3%
3 5
1.1%
4 5
1.1%
5 3
0.7%
6 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.524663680.55829596
 Dataset ADataset B
Minimum00
Maximum88
Zeros304296
Zeros (%)68.2%66.4%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-16T08:47:19.868985image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile33
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.08641121.1117193
Coefficient of variation (CV)2.07068111.9912723
Kurtosis16.94828915.214249
Mean0.524663680.55829596
Median Absolute Deviation (MAD)00
Skewness3.56458313.3814231
Sum234249
Variance1.18028921.2359198
MonotonicityNot monotonicNot monotonic
2024-10-16T08:47:19.991066image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 304
68.2%
1 103
 
23.1%
2 15
 
3.4%
3 10
 
2.2%
4 8
 
1.8%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 296
66.4%
1 108
 
24.2%
2 16
 
3.6%
4 10
 
2.2%
3 10
 
2.2%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 304
68.2%
1 103
 
23.1%
2 15
 
3.4%
3 10
 
2.2%
4 8
 
1.8%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 296
66.4%
1 108
 
24.2%
2 16
 
3.6%
3 10
 
2.2%
4 10
 
2.2%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 296
66.4%
1 108
 
24.2%
2 16
 
3.6%
3 10
 
2.2%
4 10
 
2.2%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 304
68.2%
1 103
 
23.1%
2 15
 
3.4%
3 10
 
2.2%
4 8
 
1.8%
5 3
 
0.7%
8 3
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.394618830.38789238
 Dataset ADataset B
Minimum00
Maximum56
Zeros337341
Zeros (%)75.6%76.5%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-16T08:47:20.106869image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum56
Range56
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.796829510.80990306
Coefficient of variation (CV)2.01923842.0879582
Kurtosis6.60468728.9272329
Mean0.394618830.38789238
Median Absolute Deviation (MAD)00
Skewness2.35618642.5974465
Sum176173
Variance0.634937270.65594296
MonotonicityNot monotonicNot monotonic
2024-10-16T08:47:20.222904image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 337
75.6%
1 55
 
12.3%
2 47
 
10.5%
3 3
 
0.7%
5 2
 
0.4%
4 2
 
0.4%
ValueCountFrequency (%)
0 341
76.5%
1 51
 
11.4%
2 47
 
10.5%
3 3
 
0.7%
4 2
 
0.4%
6 1
 
0.2%
5 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 55
 
12.3%
2 47
 
10.5%
3 3
 
0.7%
4 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 341
76.5%
1 51
 
11.4%
2 47
 
10.5%
3 3
 
0.7%
4 2
 
0.4%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 341
76.5%
1 51
 
11.4%
2 47
 
10.5%
3 3
 
0.7%
4 2
 
0.4%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 55
 
12.3%
2 47
 
10.5%
3 3
 
0.7%
4 2
 
0.4%
5 2
 
0.4%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct378380
Distinct (%)84.8%85.2%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-16T08:47:20.729524image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.75784756.7892377
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters30143028
Distinct characters3532
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique329331 ?
Unique (%)73.8%74.2%

Sample

 Dataset ADataset B
1st rowC.A. 3426035851
2nd row349256220367
3rd rowF.C.C. 13529S.O./P.P. 3
4th rowA/4 45380P/PP 3381
5th rowA/5 21174S.O.C. 14879
ValueCountFrequency (%)
pc 27
 
4.8%
c.a 12
 
2.1%
a/5 9
 
1.6%
ca 7
 
1.2%
ston/o 7
 
1.2%
2 7
 
1.2%
w./c 5
 
0.9%
soton/o.q 5
 
0.9%
f.c.c 5
 
0.9%
4133 4
 
0.7%
Other values (398) 479
84.5%
ValueCountFrequency (%)
pc 25
 
4.4%
a/5 13
 
2.3%
c.a 11
 
1.9%
ca 8
 
1.4%
w./c 6
 
1.1%
soton/oq 5
 
0.9%
ston/o 5
 
0.9%
2 5
 
0.9%
347082 5
 
0.9%
ston/o2 4
 
0.7%
Other values (402) 480
84.7%
2024-10-16T08:47:21.535856image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 383
12.7%
1 349
11.6%
2 279
9.3%
7 250
8.3%
4 224
 
7.4%
6 210
 
7.0%
0 209
 
6.9%
5 193
 
6.4%
9 166
 
5.5%
8 144
 
4.8%
Other values (25) 607
20.1%
ValueCountFrequency (%)
3 381
12.6%
1 334
11.0%
2 285
9.4%
7 245
 
8.1%
4 244
 
8.1%
6 205
 
6.8%
0 191
 
6.3%
5 191
 
6.3%
9 165
 
5.4%
8 164
 
5.4%
Other values (22) 623
20.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3014
100.0%
ValueCountFrequency (%)
(unknown) 3028
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 383
12.7%
1 349
11.6%
2 279
9.3%
7 250
8.3%
4 224
 
7.4%
6 210
 
7.0%
0 209
 
6.9%
5 193
 
6.4%
9 166
 
5.5%
8 144
 
4.8%
Other values (25) 607
20.1%
ValueCountFrequency (%)
3 381
12.6%
1 334
11.0%
2 285
9.4%
7 245
 
8.1%
4 244
 
8.1%
6 205
 
6.8%
0 191
 
6.3%
5 191
 
6.3%
9 165
 
5.4%
8 164
 
5.4%
Other values (22) 623
20.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3014
100.0%
ValueCountFrequency (%)
(unknown) 3028
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 383
12.7%
1 349
11.6%
2 279
9.3%
7 250
8.3%
4 224
 
7.4%
6 210
 
7.0%
0 209
 
6.9%
5 193
 
6.4%
9 166
 
5.5%
8 144
 
4.8%
Other values (25) 607
20.1%
ValueCountFrequency (%)
3 381
12.6%
1 334
11.0%
2 285
9.4%
7 245
 
8.1%
4 244
 
8.1%
6 205
 
6.8%
0 191
 
6.3%
5 191
 
6.3%
9 165
 
5.4%
8 164
 
5.4%
Other values (22) 623
20.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3014
100.0%
ValueCountFrequency (%)
(unknown) 3028
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 383
12.7%
1 349
11.6%
2 279
9.3%
7 250
8.3%
4 224
 
7.4%
6 210
 
7.0%
0 209
 
6.9%
5 193
 
6.4%
9 166
 
5.5%
8 144
 
4.8%
Other values (25) 607
20.1%
ValueCountFrequency (%)
3 381
12.6%
1 334
11.0%
2 285
9.4%
7 245
 
8.1%
4 244
 
8.1%
6 205
 
6.8%
0 191
 
6.3%
5 191
 
6.3%
9 165
 
5.4%
8 164
 
5.4%
Other values (22) 623
20.6%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct178177
Distinct (%)39.9%39.7%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean32.5806631.42214
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros69
Zeros (%)1.3%2.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-10-16T08:47:21.726488image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.225
Q17.89587.8958
median14.454214.4542
Q330.695830.5
95-th percentile120110.38748
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)22.822.6042

Descriptive statistics

 Dataset ADataset B
Standard deviation51.94557649.948189
Coefficient of variation (CV)1.59436841.5895858
Kurtosis35.56414841.011683
Mean32.5806631.42214
Median Absolute Deviation (MAD)6.80426.7209
Skewness5.00140395.3397644
Sum14530.97414014.274
Variance2698.34292494.8216
MonotonicityNot monotonicNot monotonic
2024-10-16T08:47:21.931633image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.75 21
 
4.7%
7.8958 20
 
4.5%
13 16
 
3.6%
8.05 15
 
3.4%
10.5 15
 
3.4%
26 13
 
2.9%
7.775 10
 
2.2%
26.55 10
 
2.2%
7.925 9
 
2.0%
8.6625 8
 
1.8%
Other values (168) 309
69.3%
ValueCountFrequency (%)
7.8958 28
 
6.3%
8.05 24
 
5.4%
7.75 20
 
4.5%
13 19
 
4.3%
10.5 12
 
2.7%
26.55 11
 
2.5%
0 9
 
2.0%
8.6625 8
 
1.8%
7.925 8
 
1.8%
7.2292 8
 
1.8%
Other values (167) 299
67.0%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 5
1.1%
7.0542 2
 
0.4%
7.125 2
 
0.4%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 9
2.0%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4958 1
 
0.2%
6.95 1
 
0.2%
7.05 5
1.1%
7.0542 2
 
0.4%
7.125 2
 
0.4%
7.225 5
1.1%
7.2292 8
1.8%
ValueCountFrequency (%)
0 9
2.0%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4958 1
 
0.2%
6.95 1
 
0.2%
7.05 5
1.1%
7.0542 2
 
0.4%
7.125 2
 
0.4%
7.225 5
1.1%
7.2292 8
1.8%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 5
1.1%
7.0542 2
 
0.4%
7.125 2
 
0.4%
7.1417 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8279
Distinct (%)81.2%85.9%
Missing345354
Missing (%)77.4%79.4%
Memory size7.0 KiB7.0 KiB
2024-10-16T08:47:22.377667image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.66336633.5434783
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters370326
Distinct characters1819
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6867 ?
Unique (%)67.3%72.8%

Sample

 Dataset ADataset B
1st rowF33E77
2nd rowE44D20
3rd rowD11D17
4th rowF4D9
5th rowA14C125
ValueCountFrequency (%)
b96 4
 
3.3%
b98 4
 
3.3%
c22 3
 
2.5%
c26 3
 
2.5%
c23 3
 
2.5%
c25 3
 
2.5%
c27 3
 
2.5%
g6 3
 
2.5%
b28 2
 
1.7%
c52 2
 
1.7%
Other values (81) 90
75.0%
ValueCountFrequency (%)
c22 3
 
2.8%
c26 3
 
2.8%
d17 2
 
1.9%
d20 2
 
1.9%
e67 2
 
1.9%
g6 2
 
1.9%
d33 2
 
1.9%
c124 2
 
1.9%
b49 2
 
1.9%
b96 2
 
1.9%
Other values (80) 84
79.2%
2024-10-16T08:47:22.965517image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 47
12.7%
C 39
 
10.5%
6 29
 
7.8%
B 29
 
7.8%
1 28
 
7.6%
3 21
 
5.7%
8 20
 
5.4%
7 20
 
5.4%
19
 
5.1%
4 18
 
4.9%
Other values (8) 100
27.0%
ValueCountFrequency (%)
2 37
11.3%
C 34
 
10.4%
B 30
 
9.2%
6 25
 
7.7%
1 24
 
7.4%
5 19
 
5.8%
3 19
 
5.8%
7 18
 
5.5%
8 17
 
5.2%
E 17
 
5.2%
Other values (9) 86
26.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 370
100.0%
ValueCountFrequency (%)
(unknown) 326
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2 47
12.7%
C 39
 
10.5%
6 29
 
7.8%
B 29
 
7.8%
1 28
 
7.6%
3 21
 
5.7%
8 20
 
5.4%
7 20
 
5.4%
19
 
5.1%
4 18
 
4.9%
Other values (8) 100
27.0%
ValueCountFrequency (%)
2 37
11.3%
C 34
 
10.4%
B 30
 
9.2%
6 25
 
7.7%
1 24
 
7.4%
5 19
 
5.8%
3 19
 
5.8%
7 18
 
5.5%
8 17
 
5.2%
E 17
 
5.2%
Other values (9) 86
26.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 370
100.0%
ValueCountFrequency (%)
(unknown) 326
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2 47
12.7%
C 39
 
10.5%
6 29
 
7.8%
B 29
 
7.8%
1 28
 
7.6%
3 21
 
5.7%
8 20
 
5.4%
7 20
 
5.4%
19
 
5.1%
4 18
 
4.9%
Other values (8) 100
27.0%
ValueCountFrequency (%)
2 37
11.3%
C 34
 
10.4%
B 30
 
9.2%
6 25
 
7.7%
1 24
 
7.4%
5 19
 
5.8%
3 19
 
5.8%
7 18
 
5.5%
8 17
 
5.2%
E 17
 
5.2%
Other values (9) 86
26.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 370
100.0%
ValueCountFrequency (%)
(unknown) 326
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2 47
12.7%
C 39
 
10.5%
6 29
 
7.8%
B 29
 
7.8%
1 28
 
7.6%
3 21
 
5.7%
8 20
 
5.4%
7 20
 
5.4%
19
 
5.1%
4 18
 
4.9%
Other values (8) 100
27.0%
ValueCountFrequency (%)
2 37
11.3%
C 34
 
10.4%
B 30
 
9.2%
6 25
 
7.7%
1 24
 
7.4%
5 19
 
5.8%
3 19
 
5.8%
7 18
 
5.5%
8 17
 
5.2%
E 17
 
5.2%
Other values (9) 86
26.4%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing21
Missing (%)0.4%0.2%
Memory size7.0 KiB7.0 KiB
S
325 
C
74 
Q
45 
S
334 
C
74 
Q
37 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters444445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSQ
2nd rowCS
3rd rowSS
4th rowSC
5th rowSS

Common Values

ValueCountFrequency (%)
S 325
72.9%
C 74
 
16.6%
Q 45
 
10.1%
(Missing) 2
 
0.4%
ValueCountFrequency (%)
S 334
74.9%
C 74
 
16.6%
Q 37
 
8.3%
(Missing) 1
 
0.2%

Length

2024-10-16T08:47:23.110896image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-10-16T08:47:23.219172image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:23.329130image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
s 325
73.2%
c 74
 
16.7%
q 45
 
10.1%
ValueCountFrequency (%)
s 334
75.1%
c 74
 
16.6%
q 37
 
8.3%

Most occurring characters

ValueCountFrequency (%)
S 325
73.2%
C 74
 
16.7%
Q 45
 
10.1%
ValueCountFrequency (%)
S 334
75.1%
C 74
 
16.6%
Q 37
 
8.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 325
73.2%
C 74
 
16.7%
Q 45
 
10.1%
ValueCountFrequency (%)
S 334
75.1%
C 74
 
16.6%
Q 37
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 325
73.2%
C 74
 
16.7%
Q 45
 
10.1%
ValueCountFrequency (%)
S 334
75.1%
C 74
 
16.6%
Q 37
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 444
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 325
73.2%
C 74
 
16.7%
Q 45
 
10.1%
ValueCountFrequency (%)
S 334
75.1%
C 74
 
16.6%
Q 37
 
8.3%

Interactions

Dataset A

2024-10-16T08:47:12.729414image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.867949image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:10.796566image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:13.871836image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:11.255065image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:14.329087image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:11.714910image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:14.804334image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:12.279562image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.396448image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:12.814131image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.952261image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:10.882970image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:13.955133image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:11.344718image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:14.417760image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:11.806410image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.003195image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:12.363445image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.483459image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:12.906865image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:16.051066image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:10.979297image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:14.051326image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:11.441517image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:14.521179image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:11.986653image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.100739image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:12.458145image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.582694image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:13.007778image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:16.151320image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:11.079022image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:14.150627image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:11.532873image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:14.612657image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:12.090339image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.205525image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:12.555507image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.683718image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:13.095381image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:16.243176image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:11.167949image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:14.240869image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:11.622714image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:14.710628image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:12.184468image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.302933image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-10-16T08:47:12.642250image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:15.778169image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

Dataset A

2024-10-16T08:47:23.417103image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-10-16T08:47:23.551199image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.147-0.3090.0980.2770.089-0.1440.233
Embarked0.0001.0000.1620.0850.0000.2580.0390.0000.145
Fare0.1470.1621.0000.4360.0120.4800.1550.4510.283
Parch-0.3090.0850.4361.000-0.0140.0000.1980.4680.165
PassengerId0.0980.0000.012-0.0141.0000.0000.056-0.0840.096
Pclass0.2770.2580.4800.0000.0001.0000.1450.1240.343
Sex0.0890.0390.1550.1980.0560.1451.0000.1460.552
SibSp-0.1440.0000.4510.468-0.0840.1240.1461.0000.130
Survived0.2330.1450.2830.1650.0960.3430.5520.1301.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.1810.120-0.2420.0720.2770.072-0.1640.154
Embarked0.1811.0000.2000.0000.0000.2190.1440.0730.219
Fare0.1200.2001.0000.442-0.0040.4650.1970.5230.311
Parch-0.2420.0000.4421.0000.0160.0000.3120.4720.149
PassengerId0.0720.000-0.0040.0161.0000.0000.000-0.0470.087
Pclass0.2770.2190.4650.0000.0001.0000.0620.1790.357
Sex0.0720.1440.1970.3120.0000.0621.0000.2940.495
SibSp-0.1640.0730.5230.472-0.0470.1790.2941.0000.218
Survived0.1540.2190.3110.1490.0870.3570.4950.2181.000

Missing values

Dataset A

2024-10-16T08:47:13.226426image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-10-16T08:47:16.373692image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-10-16T08:47:13.416390image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-10-16T08:47:16.561652image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2024-10-16T08:47:13.545480image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2024-10-16T08:47:16.682840image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
51651712Lemore, Mrs. (Amelia Milley)female34.000C.A. 3426010.5000F33S
69169213Karun, Miss. Mancafemale4.00134925613.4167NaNC
53553612Hart, Miss. Eva Miriamfemale7.002F.C.C. 1352926.2500NaNS
49449503Stanley, Mr. Edward Rolandmale21.000A/4 453808.0500NaNS
21221303Perkin, Mr. John Henrymale22.000A/5 211747.2500NaNS
39839902Pain, Dr. Alfredmale23.00024427810.5000NaNS
81781802Mallet, Mr. Albertmale31.011S.C./PARIS 207937.0042NaNC
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
10710813Moss, Mr. Albert JohanmaleNaN003129917.7750NaNS
57657712Garside, Miss. Ethelfemale34.00024388013.0000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
15615713Gilnagh, Miss. Katherine "Katie"female16.000358517.7333NaNQ
22122202Bracken, Mr. James Hmale27.00022036713.0000NaNS
77277302Mack, Mrs. (Mary)female57.000S.O./P.P. 310.5000E77S
87487512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NaNC
38538602Davies, Mr. Charles Henrymale18.000S.O.C. 1487973.5000NaNS
69569602Chapman, Mr. Charles Henrymale52.00024873113.5000NaNS
57057112Harris, Mr. Georgemale62.000S.W./PP 75210.5000NaNS
78778803Rice, Master. George Hughmale8.04138265229.1250NaNQ
59359403Bourke, Miss. MaryfemaleNaN023648487.7500NaNQ
15715803Corn, Mr. Harrymale30.000SOTON/OQ 3920908.0500NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
37337401Ringhini, Mr. Santemale22.000PC 17760135.6333NaNC
17217313Johnson, Miss. Eleanor Ileenfemale1.01134774211.1333NaNS
30830902Abelson, Mr. Samuelmale30.010P/PP 338124.0000NaNC
62762811Longley, Miss. Gretchen Fiskefemale21.0001350277.9583D9S
50250303O'Sullivan, Miss. Bridget MaryfemaleNaN003309097.6292NaNQ
44844913Baclini, Miss. Marie Catherinefemale5.021266619.2583NaNC
46846903Scanlan, Mr. JamesmaleNaN00362097.7250NaNQ
61761803Lobb, Mrs. William Arthur (Cordelia K Stanlick)female26.010A/5. 333616.1000NaNS
76676701Brewe, Dr. Arthur JacksonmaleNaN0011237939.6000NaNC

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
717203Goodwin, Miss. Lillian Amyfemale16.052CA 214446.9000NaNS
27928013Abbott, Mrs. Stanton (Rosa Hunt)female35.011C.A. 267320.2500NaNS
47647702Renouf, Mr. Peter Henrymale34.0103102721.0000NaNS
49449503Stanley, Mr. Edward Rolandmale21.000A/4 453808.0500NaNS
63363401Parr, Mr. William Henry MarshmaleNaN001120520.0000NaNS
61761803Lobb, Mrs. William Arthur (Cordelia K Stanlick)female26.010A/5. 333616.1000NaNS
22622712Mellors, Mr. William Johnmale19.000SW/PP 75110.5000NaNS
29629703Hanna, Mr. Mansourmale23.50026937.2292NaNC
87687703Gustafsson, Mr. Alfred Ossianmale20.00075349.8458NaNS
73773811Lesurer, Mr. Gustave Jmale35.000PC 17755512.3292B101C

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.